IW Online Header

Side Navigation Graphic

The thin blue line

Enterprise Storage Solutions For Data & Document Management

What's hotter than COLD?

Introducing Nearline for document management and image archives

By Julie Rogers, Application Manager for Document Management Solutions
Storage Technology Corp.

COLD (computer output to laser disc) systems are currently readily associated with document management and image archiving. When comparing optical jukebox technology with manually intensive microfiche or with the high cost of on-line disk, optical offers distinct advantages. But the technology has limitations--in capacity, performance, reliability and cost.

The larger an archive becomes and the faster it grows, the more apparent these limitations become. Unless they have a budget the size of Texas, those looking to create a true enterprisewide document and image archive won't be able to afford to do so with jukebox technology alone.

Nearline is an innovative and sophisticated use of proven technology. It enables long-term, high-capacity, low-cost enterprisewide document, image and data archival as well as automated and random retrieval. First introduced in 1987, Nearline has been successful in its traditional data center uses. The breakthrough in random access and continued enhancements to the technology are what have enabled Nearline to expand its customary use to applications outside of the data center and to become an important element in document management implementations.

Cost and capacity

Nearline offers a wide range of capacity options. Smaller departmental or entry level Nearline systems can store from 50-100 GB of information and can grow to over 100 times this size. One gigabyte of storage can hold approximately 500,000 text pages or 20,000 images. High-end Nearline systems can store 25 times the data of the largest entry level system--storing billions of text pages and hundreds of millions of images. The high-capacity, high-end systems cost less than $.00002 per page--less than .01 per MB.

No automated system can offer higher capacities or lower costs than Nearline. Traditional COLD vendors know that optical becomes cost prohibitive when archives grow, and they are rapidly embracing Nearline as the long-term archive of choice. Total cost of ownership includes initial acquisition costs plus on-going expenses such as maintenance, media, power and cooling, and floor space. Consider a 25-terabyte archive (a reasonable size when documents are stored with seven-year retention): a combination of high-performance and high-capacity Nearline costs approximately $0.8 million and the documents would be stored in one Nearline library, whereas the same archive on 12-in. optical would cost over $4 million and require 10 jukeboxes. Nearline offers over $3 million in savings and requires substantially less floor space.

Compare the cost of Nearline to other automated storage options as done in Figure 1. Disk is now list-pricing at about $1.50 per MB; slow or "fat"-capacity disk costs about $1.00 per MB; optical costs about $.60 to $.75 per MB and Nearline ranges between $.20 and $.01 per MB.

Microfiche and paper are the most expensive storage mediums in use today. The cost of paper is estimated at $.03 per page, which calculates to $15 per MB. Microfiche costs approximately $1.90 per megabyte to produce. These figures are for production and storage only. Retrieval costs, which drive up the cost of paper and microfiche substantially, are not reflected in these numbers but are depicted in Figure 2.

Performance

The next obvious question relates to performance. Nearline technology can deliver a document, a page or a record in under 30 seconds. The fastest-performing configuration can present requested archive data in approximately 25 seconds--just five seconds behind its costly optical counterpart--as shown in Figure 3. The highest-capacity Nearline media, which represents the lowest cost per megabyte, delivers the request in just under 70 seconds.

Nearline incorporates software and microcode intelligence with magnetic tape technology and automated tape libraries. Tape has always been an excellent storage medium, yet has traditionally been associated with slower recall performance because access to data is sequential. However, the major tape manufacturers have made momentous strides in the technology that now incorporate high-speed search techniques enabling its use for random-access applications. The indexes that are used to locate and retrieve documents also keep track of the exact location of your document on the media, and traditional sequential read access is eliminated. High-speed search quickly advances to the correct block on the tape and delivers the document or image to the requester.

Data transfer rates from tape are unmatched--even by on-line disk. They are far superior to optical. This is increasingly important as you consider the time required to write archive data to archive media. The high-capacity helical-scan tape drives can transfer data in excess of 11 MB per second. This is more than double the data transfer rate of most on-line disk systems, over 500% faster than the rate of most optical systems and 2000% faster than CD technology as illustrated in Figure 4. Data transfer rates are important to retrieval performance as well as for writing documents to the archive. The larger the document, image, or file folder being retrieved, the more impact data transfer rates possess.

Configuration and performance

The most important element in retrieval performance from any robotic device is having a read/write head available at the time of the request. Once a high-performance tape or optical platter is mounted, time to data is relatively the same. But if there is no read/write device available when a document is requested, the user is refused allocation and goes into a queue or their request is terminated.

Most optical devices have a maximum of 2, 4 or 6 read/write heads. Nearline systems by contrast offer 8, 10, or 16 per library. In addition, multiple Nearline libraries (up to 16) can be connected to form one logical unit with over 250 automated read/write heads available for user requests.

Some optical systems allow "external" drives to be added to the system, which means the robot has no access to the device. Instead, an operator must locate the requested platter and physically mount it in the drive. It is common to find high-end optical users frustrated by long queue times or by having to facilitate hundreds of manual mounts per day. The response time for these users is deplorable. Most organizations in this predicament buy more jukeboxes to gain more "internal" or automated read/write heads--throwing capacity at a performance problem and substantially driving up the cost of the archive system.

Hierarchical storage management

Many articles have been written of late about HSM or hierarchical storage management. What is it? The ability to migrate data, documents and images from one storage medium to another once access patterns change. When documents are in production use, they belong on-line, on disk, where multiple users can access them with subsecond response time. When frequency of access to a document or file drops off (when it is needed only for reference or for customer service inquiries), it is moved to another media for long-term archival. Then when it is requested by a user, it is either accessed directly from the archive media or restored back to disk.

When microfiche or paper are used for the archive, manual intervention is required for retrieval--a very costly alternative. Nearline enables automated access to the data for as long as you want to retain it. Unlike optical, Nearline allows you to migrate data not only from disk to Nearline, but to migrate it within the Nearline hierarchy. This means you can move documents and images from disk to high-performance Nearline, then again to high-capacity, lower-cost Nearline. As access to information continues to decline, so should your costs. This objective requires that access remain automated and transparent to the user. This is what HSM aspires to accomplish.

Backup

Backup is a critical part of any document management system for several reasons, the foremost being disaster recovery. The archive must be safeguarded against disaster. Legal retention requirements demand it, but so do sound business practices. Corporate and agency data is an invaluable asset that can never be fully recovered if not properly protected.

Data transfer rates play another crucial role in this regard. Often, optical administrators must sacrifice full back-ups because they cannot both write their data to the archive and back it up in a 24-hour period. With the exceptional speed of tape, backups can be accomplished in a fraction of the time, ensuring your archives are safe and secure. Many software providers who have written interfaces to Nearline devices employ duplexing techniques so that a second copy of the archive is made simultaneously with the first. This second copy can be made at the same site as the first, then moved to an off-site location, or the second copy can be made directly to a remote site. This concept, known as remote vaulting, advocates moving the data, not the media, which eliminates human intervention and therefore dramatically reduces potential mishandling.

The inexpensive nature of tape coupled with its extraordinary transfer rates enables crucial backup procedures to be standard business practice rather than a bypassed luxury.

Media life

Magnetic tape is a very reliable and robust media. Media life for high-quality data tapes exceeds 30 years. The use of automated tape libraries can extend this life considerably as proper handling of the physical cartridges is guaranteed. Tape has been pervasive since its introduction by IBM in the early 1950s, and data recorded in that decade is still accessible today. Just as great strides have been made in the use of tape, so have great strides been made in the media itself. The improvements made in the technology only strengthen the projections of its durability.

Standards

Standards have been key to the success and proliferation of tape. It has been, and continues to be, the worldwide defacto standard for backup, interchange and disaster recovery. If you lose your data center, you absolutely must be able to read your backup tapes from drives different from the ones that originally created the tapes. Standards are just as crucial for your long-term document archives so that they can be retrieved in the decades ahead.

Multiple host support--sharing the Nearline system

Nearline systems can attach to over 25 various computing platforms--simultaneously. This means sharing the investment between applications and multiple hosts making it even more cost-effective than its alternatives. Most optical systems are installed for one application, making cost justification of the initial acquisition difficult. Nearline enables you to exploit the investment across corporate boundaries.

In addition to using the system to access reports, documents and images, you can:

* automate backups for all host platforms and network servers;

* use Nearline for remote vaulting of distributed data centers, or as an enterprise archive repository;

* enable many storage-intensive applications that were previously cost-prohibitive.

Costly microfiche can be eliminated and paper production reduced. Transactions and records can be archived and indexed from on-line databases, reducing disk requirements and speeding up batch processing.

The most appealing aspect of the multiple host support is the investment protection it provides. Corporations used to buy hardware first, and then decide which application package to buy. Obviously, the choices were limited by the selected server. Now, companies select software that provides the best solution to their business problems first--then secure the required server.

Because Nearline simultaneously supports so many host systems, you can swap or add new platforms to your configuration anytime.

Technological innovations

Why is tape suddenly an option for random-access applications? The answer lies in technological advances. Storage Technology has been in business since 1969, providing competitive tape systems to organizations primarily for use in the data center for backup and batch processing. In 1987, the company introduced Nearline, using automated tape libraries and revolutionizing the way tape could be used in a data center. Backup times were cut dramatically and physical handling and processing errors were eliminated.

In 1995, NearlinePLUS was announced, introducing a new 1x1 architecture (one controller for every one tape transport), which made StorageTek drives the fastest tape retrieval devices and the highest tape capacity devices in the world.

In addition, industry-leading software vendors have written applications or written into interfaces that allow their software to take full advantage of Nearline speed and capacity. Documents, images and records can be viewed directly from tape without having to recall them back to disk first. Of course, for those applications that require extensive use of a file from the archive--such as research projects--files can be recalled back to disk, utilized and subsequently deleted as the master remains within the archive.

Tape is not what it was 10, or even five, years ago. Innovations in this technology are impressive and warrant the attention they are now receiving.

Conclusion

Nearline is a cost-cutting, multi-functional, competitive retrieval system for document and image archives. It is flexible, scalable, software- and host-independent, and provides exceptional value to an enterprise.

There are thousands of installed Nearline systems in data centers throughout the world. With the introduction of NearlinePLUS and software vendor integration, there are now hundreds of systems employing Nearline for their document management and image archives.

If you want to build a enterprisewide archive system that will grow as your requirements do, will allow change as technology changes, and enable you to protect your investment for decades to come, take a look at what the Fortune 1000 have known for many years--take a look at Nearline and discover what is truly "hotter than COLD".

Storage Technology is a $1.9 billion company that designs, manufactures, markets and services worldwide, information storage and retrieval subsystems for enterprisewide computer systems and networks. StorageTek's Web site is located at http:www.stortek.com.

Nearline is a registered trademark of Storage Technology Corporation.

Julie Rogers is the application manager for Document
Management Solutions at Storage Technology (Louisville, Colorado). Julie has spent over 14 years in the information technology industry. She can be reached at 303-673-8152, fax 303-661-6221, or E-mail julie_rogers @stortek.com.


TOP OF PAGE


HOME ++ SEARCH IW ++ DAILY NEWSWIRE ++ CURRENT PUBLICATIONS ++ ABOUT IW
BUYER'S GUIDE ++ DIIME ++ SUBSCRIBE ++ COMMENTS


© 1995, 1996, 1997 Cardinal Business Media, Inc.[LiveLink] All Rights Reserved. The names, logos and icons identifying CBMÆs products and services are proprietary marks of Cardinal Business Media, Inc. CBM has no liability for content or goods on the Internet except as set forth in the Terms and Conditions of Service[LiveLink].